Basic Syntax: Operators, Functions, and Objects
At its most basic, you can use R as a calculator: type out
mathematical operations in the script section after you have opened a
script (the field in the top left of the interface), highlight the chunk
of code you want to run, then press the Run button. You
will get the result in the console section (the field in the bottom left
of the RStudio interface). You can also tell R to run your code with
command+enter instead of pressing Run.
2+2
Note that R syntax is not sensitive to white space: 2+2, or 2 + 2, or
2 +2 are equivalent. Similarly, at any point you can continue your code
on a new line: R won’t mind.
The mathematical operators in R are pretty intuitive:
-3 * 5
(20-6) / 2
7^2
64^(0.5)
R evaluates mathematical operators in the order you’ve been taught at
school: parentheses first, then exponentiation, then multiplication,
then division, then addition, and then subtraction (PEMDAS). So, if you
want to call \(4+\frac{2}{5+13}\), you
would run
4+2/(5+13)
While if you want to call \(\frac{4+2}{5+13}\), you would run
(4+2)/(5+13)
This is a good place to introduce your first functions.
Functions in R are expressions that come with parentheses and within
these parentheses you pass one or more arguments. The
following lines of code are some simple mathematical functions, where
you pass one numerical argument and they return the result of an
operation:
abs(-10) #computes the absolute value
sqrt(81) #computes the square root
log(4) #computes natural logarithm
exp(0.5) #computes the exponential function (e to the power of a number)
Sometimes you have to pass more than one argument to the function. We
then use commas to separate the arguments:
log(8, base = 2) #computes the logarithm of a number in base 2
round(123.456789, digits = 2) #rounds a number to the 2nd decimal place
round(-123.456789, digits = 4) #rounds a number to the 4th decimal place
When you become familiar with a function and the arguments it takes,
you won’t need to specify the argument’s name. In many ways, R
becomes simpler to use the more familiar you are with the functions. For
instance, the expressions above are equivalent to:
log(8, 2) #computes the logarithm of a number in base 2
round(123.456789, 2) #rounds a number to the 2nd decimal place
round(-123.456789, 4) #rounds a number to the 4th decimal place
If you want to know more about what a function does, you can type
? and then the function, and run the code. Or indeed, use the
help() function. This call will open the help window in the
bottom right of your interface, with descriptions and examples of how
the function can be used. For instance:
?log
help(log)
The principle of using functions can look complicated but actually is
just a combination of logical operators. You can nest functions and
operations within each other in a single expression:
sqrt(10*5-1)
round(log(5+5)*2, digits = 3)
It is often useful to “store” values as named objects, using the
all-important <- (assign) operator.
These values can be numbers or character strings; in the latter case you
have to surround the text with quotation marks.
Note that = was an alternative assignment operator.
However, this is deprecated, because you are going to use the
equal sign in other contexts to do other things. You will see it used in
other people’s code, but to make things as unambiguous as possible, get
used to using only <- for assignment.
x <- 7
y <- 4+4
my_course <- "Data Analysis in R"
When you assign values to a named object, R does not return the value
in the console, but it will appear in the global
environment window in the top right of the interface. This means
that you have successfully stored this value, and at any point you can
simply run your object’s label (e.g. my_course) and R will
return the value you have assigned to it.
Note that you can access the object from any script. Note also that
the object’s label needs to be continuous text, so instead of using
my course, you want to use my_course with an
underscore.
You can pass named objects as arguments of functions or operations,
just as we did numbers:
sqrt(x)
(x*y)/2
But obviously it’s not a particularly good idea to run
my_course+5
rather, you’d want to return the object itself:
print(my_course) # or simply
my_course
The Building Blocs: Vectors and Dataframes
In data analysis, we normally work with variables taking a series of
values across a number of observations rather than a single unit only.
R handles these sequences as vectors, and you can
create your own vectors using the c() function (combine) -
in fact, you have already been using vectors all along: the objects we
worked with so far are simply vectors of length 1!
Obviously, we can also create larger vector by combining several
values:
prime_numbers <- c(2,3,5,7,11)
zero_to_ten <- c(0,1,2,3,4,5,6,7,8,9,10)
friends <- c("Rachel", "Monica", "Joey", "Chandler", "Ross", "Phoebe")
Some other ways to create vectors:
one_to_onehundred <- 1:100
#creates a vector with all integers between 1 and 100
the_decimals <- seq(0, 1, by = 0.1)
#creates a vector with all numbers from 0 to 1 by intervals of 0.1
ones <- rep(1, 100)
#creates a vector repeating the first argument (1), n times,
#where n is the second argument (100).
lots_of_friends <- rep(friends, 5)
#as above but here the first argument is the vector "friends".
You can now pass the functions and operations described above to the
whole sequence. The function is applied to each element of the original
vector, and R will compile a vector with the sequence of results :
squares_of_primes <- prime_numbers^2
ten_to_twenty <- zero_to_ten+10
Other functions return a single value computed from all the elements
in the vector:
length(zero_to_ten) #returns the number of items in a vector
sum(zero_to_ten) #returns the sum of elements of the vector
mean(zero_to_ten) #returns the mean of elements of the vector
median(zero_to_ten) #returns the median of elements of the vector
max(zero_to_ten) #returns the maximum value in the vector
min(zero_to_ten) #returns the minimum value in the vector
You can also ask R to evaluate each element of a vector according to
a logical expression. This will return a vector of logical values
TRUE/FALSE. Stick a pin on this because it will be very
useful when we index and subset data frame variables later. For
instance:
zero_to_ten > 6 #is the element larger than six?
zero_to_ten >= 4 #is the element larger or equal than four?
zero_to_ten == 1 #is the element a one?
Note the double equal operator used in the last
example. Annoyingly, when you want to ask R if an object is equal to
something, you have to use ==, the single = is
only to be used within functions. Using = instead of
== is probably the most common syntax error for new
learners and experienced coders alike; so when R spits out an error,
double-check the equality signs in your code.
Finally, you can use the unique() function to get a
vector of the unique elements occurring in a vector, and the
table() function to see how many times a value occurs.
fruit <- c("apple", "pear", "pear", "orange",
"banana", "apple", "pear", "banana",
"pear", "apple", "apple", "banana")
unique(fruit)
table(fruit)
#note that the output of table() is not a vector, but a different class of object, a table.
Just like vectors are collections of objects, we can create
collections of vectors: dataframes. You can do so by passing
your vectors in the data.frame() function: By default, your
vectors will be treated as columns, and the vector names will become
column names.
zero_to_ten
squares <- zero_to_ten^2
my_dataframe <- data.frame(zero_to_ten, squares)
#visualise my_dataframe in the console simply by calling:
my_dataframe
#visualise my_dataframe in a viewer window:
View(my_dataframe)
#you can pass as many arguments to data.frame() as you want:
roots <- sqrt(zero_to_ten)
my_dataframe <- data.frame(zero_to_ten, squares, roots)
#you can give your dataframe column names within the data.frame() function, for instance:
my_dataframe <- data.frame(zero_to_ten, cubes = zero_to_ten^3)
Note that when you create an object with the same name as an
existing object (in this case, my_dataframe), the previously saved
object will be overwritten - if in doubt, choose a new
name!
Be careful when using the data.frame() function with the
length of your vectors. If the vectors are not of the same length but
multiples of the other, the smaller vector will be replicated (or
“looped”) when you bind them. If the two vector lengths are not even
multiples of each other, you will not be able to bind them into a single
data frame:
length(fruit)
#what's the length of my vector?
data.frame(fruit, binary_variable = c(0,1))
#the second vector is of length 2, which is a divisor of 12, so it gets replicated.
#data.frame(fruit, binary_variable = c(0,1,0,0,1))
#the second vector is of length 5, which is *not* a divisor of 12, so you get an error.
Dataframes can take text, numerical and logical vectors. As long as
they’re the same length, you’re good to go:
students <- c("Student A", "Student B",
"Student C", "Student D")
grade <- c(55, 70, 35, 81)
pass <- grade >= 50
data.frame(students, grade, pass)
To recap: objects can be numbers (1, 2, 3), text strings (“banana”,
“apple”) or logical values (TRUE, FALSE).
Vectors are collections of objects that you combine with the
c() function. Dataframes are collections of vectors that
you create with the data.frame() function.